The Kozak consensus sequence, Kozak consensus or Kozak sequence, is a sequence which occurs on eukaryotic mRNA and has the consensus (gcc)gccRccAUGG, where R is a purine (adenine or guanine) three bases upstream of the start codon (AUG), which is followed by another 'G'.[1] The Kozak consensus sequence plays a major role in the initiation of the translation process.[2] The sequence was named after its discoverer, Marilyn Kozak.
Contents |
This sequence on an mRNA molecule is recognized by the ribosome as the translational start site, from which a protein is coded by that mRNA molecule. The ribosome requires this sequence, or a possible variation (see below) to initiate translation. The Kozak sequence is not to be confused with the ribosomal binding site (RBS), that being either the 5' cap of a messenger RNA or an Internal Ribosome Entry Site (IRES).
In vivo, this site is often not matched exactly on different mRNAs and the amount of protein synthesized from a given mRNA is dependent on the strength of the Kozak sequence.[3] Some nucleotides in this sequence are more important than others: the AUG is most important because it is the actual initiation codon encoding a methionine amino acid at the N-terminus of the protein. (Rarely, CUG is used as an initiation codon, encoding a leucine instead of its typical methionine.) The A nucleotide of the "AUG" is referred to as number 1. For a 'strong' consensus, the nucleotides at positions +4 (i.e. G in the consensus) and -3 (i.e. either A or G in the consensus) relative to the number 1 nucleotide must both match the consensus (there is no number 0 position). An 'adequate' consensus has only 1 of these sites, while a 'weak' consensus has neither. The cc at -1 and -2 are not as conserved, but contribute to the overall strength.[4] There is also evidence that a G in the -6 position is important in the initiation of translation.[2]
There are examples in vivo of each of these types of Kozak consensus, and they probably evolved as yet another mechanism of gene regulation. Lmx1b is an example of a gene with a weak Kozak consensus sequence.[5] For initiation of translation from such a site, other features are required in the mRNA sequence in order for the ribosome to recognize the initiation codon.
Research has shown that a mutation of G—>C in the -6 position of the β-globin gene (β+45; human) disrupted the haematological and biosynthetic phenotype function. This was the first mutation found in the Kozak sequence. It was found in a family from the Southeast Italy and they suffered from thalassaemia intermedia.[2]
(gcc)gccRccAUGG AGNNAUGN ANNAUGG ACCAUGG GACACCAUGG
Biota | Phylum | Consensus sequences |
---|---|---|
Vertebrate | gccRccATGG[1] | |
Fruit fly (Drosophila spp.) | Arthropoda | cAAaATG[6] |
Budding yeast (Saccharomyces cerevisiae) | Ascomycota | aAaAaAATGTCt[7] |
Slime mold (Dictyostelium discoideum) | Amoebozoa | aaaAAAATGRna[8] |
Ciliate | Ciliophora | nTaAAAATGRct[8] |
Malarial protozoa (Plasmodium spp.) | Apicomplexa | taaAAAATGAan[8] |
Toxoplasma (Toxoplasma gondii) | Apicomplexa | gncAaaATGg[9] |
Trypanosomatidae | Euglenozoa | nnnAnnATGnC[8] |
Terrestrial plants | AACAATGGC[10] |
|